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ABSTRACT 

Transgenesis is a cornerstone of molecular biology. 
The ability to integrate a specifically engineered 
piece of DNA into the genome of a living system 
is fundamental to our efforts to understand life 
and exploit its implications for medicine, nanotech- 
nology and bioprospecting. However, transgenesis 
has been hampered by position effects and 
multi-copy integration problems, which are mainly 
due to the use of small, plasmid-based transgenes. 
Large transgenes based on native genomic regions 
cloned into bacterial artificial chromosomes (BACs) 
circumvent these problems but are prone to frag- 
mentation. Herein, we report that contrary to 
widely held notions, large BAC-sized constructs do 
not prohibit transposition. We also report the first 
reliable method for BAC transgenesis in human em- 
bryonic stem cells (hESCs). The PiggyBac or 
Sleeping Beauty transposon inverted repeats were 
integrated into BAC vectors by recombineering, 
followed by co-lipofection with the corresponding 
transposase in hESCs to generate robust fluores- 
cent protein reporter lines for OCT4, NANOG, 
GATA4 and PAX6. BAC transposition delivers 
several advantages, including increased fre- 
quencies of single-copy, full-length integration, 
which will be useful in all transgenic systems but 
especially in difficult venues like hESCs. 

INTRODUCTION 

Early work on transgenesis in animals and cell lines in- 
variably used small transgenes, which only rarely achieved 
the intended expression pattern due mainly to position 



effects exerted by the genomic integration site or 
concatamerization. These major problems have been cir- 
cumvented by the use of large transgenes such as bacterial 
artificial chromosomes (BACs), which carry intact 
genomic regions and often deliver the expected expression 
pattern precisely (1). 

Due to their large size, BACs can accommodate complete 
genes including all ds-regulatory elements in their native 
configuration. Consequently, most BAC transgenes are 
indifferent to position effects and often deliver expression 
levels in proportion to the transgene copy number. Many 
BAC libraries have been annotated onto genome browsers 
and are readily available from genome resource providers 
such as CHORI (www.chori.org). Furthermore, BACs 
can be readily modified and mutated using recombineering 
(2-5). These advantages have promoted BACs to the fore- 
front as transgenic tools and now BAC transgenesis 
has been successfully applied to produce a variety of trans- 
genic animals, such as mice, rats, zebrafish and flies (6-9), 
as well as for studies of gene function, molecular comple- 
mentation of mutations, identification of distant regula- 
tory elements and analysis of gene dosage, among other 
applications (1,10-13). Because they often recapitulate ex- 
pression patterns precisely, BAC transgenes are also 
widely used to create gene expression reporters for 
studies during development and differentiation. 

Human embryonic stem cells (hESCs) (14) provide an 
essential venue for studies of human development and 
disease that complements work with model systems such 
as the mouse. Like mouse ESCs (mESCs), they can be 
differentiated in culture to recapitulate aspects of human 
embryology and to serve as paradigms for future medicine 
with cellular therapies. However, they are difficult to mani- 
pulate genetically, particularly for gene targeting (15-17). 

The work reported here began with our efforts to create 
stable hESC reporter lines based on fluorescent protein 
expression driven by stage- and lineage-specific promoters. 
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Although we were able to create an OCT4-GFP reporter 
line by gene targeting (data not shown), the efficiency of 
homologous recombination in hESCs is low (15,16) and 
our attempts to generate a knock-in for lineage- specific 
genes have not been successful. On the other hand, ran- 
domly integrated retroviral and small transgenes often 
undergo transcriptional silencing in hESCs (17-19). 
Consequently, we were attracted by the advantages of 
BAC transgenesis and used the only published method 
for BAC transgenesis in hESCs, which is based on 
nucleofection (20). Unexpectedly, transgene silencing was 
consistently observed, which we correlated with consistent 
failures to obtain integrations of full-length BAC trans- 
genes. To solve the problem of BAC fragmentation, we 
explored the possibility that transposition could be used to 
integrate full-length BAC transgenes. 

DNA transposons are mobile elements that contain 
inverted terminal repeats (ITRs), which are recognition 
sites for a transposase that cuts at the outside end of the 
inverted repeats and moves the excised DNA into a new 
site. Transposons have been used for insertional mutagen- 
esis and gene transfer in many model organisms. However, 
applications in vertebrates were impeded due to the lack 
of active transposons until Tol2 was isolated from the 
Japanese Medaka fish Oryzias latipes (21,22) and Sleeping 
Beauty (SB) was reactivated from the salmon genome by the 
elimination of phylogenetically identified mutations 
(23,24). In 2005, PiggyBac transposon isolated from the 
cabbage looper moth Trichoplusia ni was reported to be 
active in mammalian cells including mouse and human 
(25). Consequently, several options for transposition in 
fish, mouse and human cells are now available. In particular, 
SB and PiggyBac appear most useful (26-3 1) and increased 
activity variants of both have been recently identified (32). 
Notably, transposase-mediated transgenesis has been 
used in cells that are difficult to transfect including human 
haematopoietic stem cells (32,33) and hESCs (34-36). 
Consequently, we were encouraged to examine whether 
BAC transgenesis in hESCs could be facilitated by trans- 
position. However, transposons appear to have severe size 
limitations (37), which have limited their use for large 
transgenes. 

During attempts to integrate large (up to 60 kb) trans- 
genes into Myxococcus and Pseudomonas prokaryotic 
hosts, we encountered problems with fragmentation, 
which we solved by use of transposition (38). Further- 
more, Tol2 transposition has been used to integrate a 
66 kb transgene into zebrafish and mouse genomes (39). 
These studies indicate that fears about the size limitations 
of transposons may be misguided. Herein, we show that 
transposition can be applied to integrate full-length BACs 
larger than 150kb into hESCs, which has implications for 
BAC transgenesis in general and particularly in systems 
that are difficult to work with. 



MATERIALS AND METHODS 

Generation of large reporter constructs and BAC reporters 

The large constructs were made by subcloning from 
the respective BACs a region of 19 kb for hOCT4 gene 



and 25 kb for hNANOG into a plasmid with pl5A origin 
of replication using recombineering technology 
(Supplementary Figure SI) (2,3). For the generation of 
large construct or BAC reporters, the green fluorescent 
protein (GFP) or Cherry cassettes were inserted directly 
after the initiating methionine (ATG) of the respective 
gene using recombineering. The PiggyBac or SB 
terminal repeats were inserted into different positions of 
the BAC backbone using a universal recombineering 
strategy applicable to most of the common used BAC 
vectors (Supplementary Figure S2). The recombineering 
details and list of oligos are presented in Supplementary 
Experimental Procedures. 

hESC culturing 

H7.S6 and H9 hESCs were cultured on mouse embryonic 
fibroblasts (MEFs) in DMEM/F12 medium supplemented 
with 20% Knockout Serum Replacement (Invitrogen) 
and 4ng/ml basic fibroblast growth factor (bFGF) 
(Peprotech) and passaged using 1 mg/ml collagenase IV 
(Invitrogen) adding lOuM Rho-associated kinase 
(ROCK) inhibitor Y-27632 (40). For transfections and 
differentiation assays, the cells were transferred to 
feeder-free conditions on Matrigel (BD Biosciences) in 
MEF-conditioned hESC medium, and propagated using 
TrypLE (Invitrogen). 

Transfections of hESCs 

Electroporation of large constructs into hESCs was 
performed according to the standard protocol at 320 V 
and 250 uF (15). BAC transfection was performed either 
by nucleofection (20) or lipofection. hOCT4-GFP, 
hNANOG-GFP, hPAX6-GFP and hGATA4-GFP 
BACs were prepared using Nucleobond BAC 100 kit 
(Macherey-Nagel). Nucleofection was done in 100 ul of 
solution V using program B-016 according to manufac- 
turer protocol (Amaxa). 5 x 10 6 of cells were nucleofected 
with 5 ug of the BAC and 300 ng of the transposase ex- 
pression or control vector. 

For lipofection, hESCs were split to Matrigel-coated 
dishes in the ratio 1:3, 1 day before transfection. 3, 10, 
30 or 50 jig of BAC and 3 or 10 jig of the transposase 
expression or control vector were used for lipofection of 
a 10 cm dish with hESCs using Lipofectamine LTX 
(Invitrogen) according to manufacturer protocol. 

Selection with G418 (100 ug/ml; Invitrogen), puromycin 
(0.5 ug/ml; Sigma) or blasticidin (2 ug/ml; Invitrogen) 
started 2 days after transfection. After 14 days of selec- 
tion, stable resistant clones were picked to 96-well plates 
and expanded. 

Polymerase chain reaction analysis of hESC clones 

Genomic DNA from the hESCs clones was prepared 
directly in 96-well plates and used for screening by 
polymerase chain reaction (PCR) for the presence of trans- 
poson inverted repeats and loss of ampicillin/spectino- 
mycin cassette that occurs during transposition. The 
clones that contained a BAC integrated by transposase 
according to PCR analysis (PB5 + Amp" PB3 + or SB5 + 
Spec - SB3 + ) were checked for the BAC copy number by 
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quantitative PCR (qPCR). Five to six pairs of primers 
were designed along each BAC randomly with the 
distance ~30-40kb between primer pairs (listed in 
Supplementary Experimental Procedures). The copy 
number was calculated by normalization of C t -values for 
each primer pair to GAPDH gene relatively to wild-type 
cells, which contain two allelic copies of each genomic 
region. 

Splinkerette PCR 

The integration sites of BACs in the clones were deter- 
mined using splinkerette PCR (41). Genomic DNA 
digested with Apol or BstYl was ligated to 75 nM 
splinkerette adaptor (SPLK-A and SPLK- Apol /BstYl). 
The 5'- and Sanctions were amplified using nested 
PCR and sequenced. 

Differentiation of hESCs 

For differentiation of H7.S6 hOCT4-GFP clones, bFGF 
was removed from the cultured medium for 10 days and 
then GFP expression was analysed by flow cytometry. 

H7.S6 hPAX6-GFP clones and wild-type cells were 
differentiated to neural epithelial cells (42,43). Embryoid 
bodies (EBs) were formed in N2B27 medium with lOuM 
transforming growth factor beta (TGF(3) receptor inhibi- 
tor SB431542 (Tocris). After 8 days, EBs were spread to 
tissue culture dishes coated with lOOug/ml poly ornithine 
and lOug/ml laminin (Sigma-Aldrich). Expression of 
PAX6 and GFP was checked by flow cytometry or im- 
munocytochemistry on Day 14 of differentiation. 

H7.S6 hGATA4-GFP clones and wild-type cells were 
differentiated to definitive endoderm (44). RPMI supple- 
mented with B27 (lx), ImM sodium butyrate, lOOng/ml 
activin A and 25 ng/ml Wnt3a (both from Peprotech) was 
used for the first day of differentiation. Next day, Wnt3a 
was omitted from the medium, and the cells were cultured 
further in RPMI with B27 (1 x), 0.5 mM sodium butyrate 
and 100 ng/ml activin A. On Day 7, the cells were analysed 
for GFP and CXCR4 expression by flow cytometry, and 
for markers expression by qPCR. 

Immunostaining and microscopy 

The incubation with primary antibodies was for 1 h at 
room temperature with mouse anti-Oct4 (1:50, sc-5279; 
Santa Cruz), rabbit anti-Nanog (1:30, AB5731; 
Chemicon) or overnight at +4°C with mouse anti-Pax6 
(1:30; Developmental Studies Hybridoma Bank). The cells 
were incubated with secondary antibodies diluted 1:500 
(FITC goat anti-rabbit and TRITC goat anti-mouse; 
Jackson Immunoresearch Laboratories and Alexa633 
goat anti-mouse; Molecular Probes, Invitrogen) for 1 h at 
room temperature. Fluorescence images were taken using 
Leica SP5 laser scanning confocal microscope. 

Flow cytometry 

The cells were dissociated and fixed in phosphate buffered 
saline (PBS) with 1% formaldehyde. For antibody 
staining, 10 6 of live cells were incubated with 
R-Phycoerythrin (PE)-conjugated anti-CD 184 or 



immunoglobulin G (IgG) isotype control (BD 
Biosciences) diluted 1:100 in PBS with 2% fetal calf 
serum (FCS) for 30min at +4°C. The cells were analysed 
with flow cytometer LSR II (Becton Dickinson) using 
FACSDiva software. The data were processed with 
FloJo software. 

RESULTS 

Generation of OCT4 and NANOG fluorescent reporter 
H7.S6 cells using large constructs 

To study pluripotency, lineage commitment and differen- 
tiation pathways in hESCs, we aimed to create a panel of 
reporter cell lines based on the expression of fluorescent 
proteins under control of selected promoters. Using 
recombineering, genomic regions containing the OCT 4 
and NANOG genes (19 and 25 kb, respectively) were 
subcloned from BACs and GFP or mCherry IRES 
neomycin cassettes were inserted at the initiating methio- 
nine codon (Figure la and Supplementary Figure SI). 
Stable H7.S6 hESC clones were established after electro- 
poration and G418 selection. However, we failed to obtain 
clones with uniform expression of the reporters, despite 
the fact that OCT4 and NANOG are expressed in undif- 
ferentiated hESCs. All clones (n = 8) showed mosaic ex- 
pression (59.9-85.0% positive cells by flow cytometry), 
which was further reduced when G418 selection pressure 
was removed. However, the non-fluorescent cells were not 
differentiated, as shown by staining with OCT4 and 
NANOG antibodies (Figure lb). We also generated 
double stable reporter lines after a second round of elec- 
troporation. The double reporter clones H7.S6 
OCT4-mCherry/OCT4-GFP and NANOG-mCherry/ 
NANOG-GFP also displayed mosaic expression of both 
fluorescent transgenes, which only partially overlapped 
(Figure lc), indicating that even these relatively large 
transgenes undergo random silencing. 

Generation of hESC BAC reporters using PiggyBac 
transposition 

To circumvent silencing, we decided to use BAC trans- 
genes. BACs containing OCT4 and NANOG genes were 
modified with the GFP-IRES-neo-pA reporter cassettes 
and stable hESC clones were established using 
nucleofection (20). However, once again we observed 
mosaic expression (data not shown). Because small trans- 
posons have been shown to mediate efficient gene transfer 
in mESC, hESC and human cell lines (25,29,34-36), we 
decided to evaluate whether transposition could be 
applied to BAC transgenesis. Among several transposons, 
we first chose PiggyBac based on its reported efficiencies 
in mammalian cells (26,45). The OCT4 and NANOG 
BACs were further recombineered to insert a cassette 
into the BAC backbone that contained the PiggyBac 
ITRs (5 / -313bp, 3'-235bp (46)); flanking an ampicillin re- 
sistance gene. For this purpose, we built a recombineering 
cassette in an R6K plasmid so that Pad or Pacl/Ascl 
restriction digestion releases a fragment that will recom- 
bine with most common human and mouse BAC vectors; 
pBACe3.6, pBeloBACll, pTARBACl, pTARBAC1.3, 
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Figure 1. H7.S6 OCT4 and NANOG reporter lines created by existing transgenic methods show mosaic expression, highlighting the need for BAC 
transgenes. (a) Stable H7.S6 hESC clones carrying 19 kb OCT4 or 25 kb NANOG reporter transgenes to express mCherry or GFP IRES neo reporter 
cassettes (see also Supplementary Figure SI), exhibited mosaic expression as determined by fluorescent imaging (left panels) or flow cytometry (FACS 
scans at the right). Fluorescent protein expression from the transgenes fell upon removal of G418 selection (compare +G418 with -G418 FACS 
panels), (b) Immunostaining of H7.S6 OCT4-mCherry and NANOG-mCherry reporter lines (63 x, zoom) for endogenous OCT4 or NANOG 
expression showed that most cells expressed the endogenous proteins but many did not express the fluorescent reporter (arrowheads), indicating 
that the mosaic expression was due to silencing of the transgene and not differentiation of the cells, (c) Double stable reporter H7.S6 
OCT4-mCherry/OCT4-GFP and NANOG-mCherry /NANOG-GFP lines were generated after transfecting the single OCT4 and NANOG reporters 
above. Both OCT4 and NANOG double mCherry/GFP reporters showed partially overlapping mosaicism, indicating random silencing of the 
reporter. Arrows and asterisk show cells that exhibited only GFP or mCherry fluorescence, respectively. 
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Figure 2. BAC transgenesis using Piggy Bac transposition, (a) Human BACs were modified by recombineering with GFP reporter cassettes that were 
inserted directly after the start codon (ATG) and contained a selectable marker expressed either by the gene promoter (for genes expressed in 
hESCs), or by PGK promoter (for genes that are not expressed in hESCs). The figure shows the hNANOG example, which is expressed in hESCs. 
A second recombineering step inserted a standardized cassette containing PiggyBac ITRs (PB5 and PB3) flanking the ampicillin resistance gene 
(Amp) into the BAC backbone. The PiggyBac ITR/ampicillin cassette was cloned into an R6K vector so that Pacl/Ascl digestion releases a 
restriction fragment that is flanked by homology regions that will recombine with most BAC vectors. The modified BACs were co-transfected 
with a PiggyBac transposase (mPBase) expression plasmid into hESCs. (b) Transposition of the BAC by PiggyBac will be full-length, flanked by 
the ITRs and ampR will be omitted. Hence, PCR assays for the presence of PB5 and PB3 with simultaneous loss of Amp indicates transposition. 
The copy number of the BAC was determined by quantitative, allele counting PCR (qPCR) on the genomic DNA using 5-6 primer pairs at about 
30-40 kb intervals along the BAC (a-e). 



pTARBAC2, pTARBAC2.1 and pTARBAC6 (Figure 2a 
and Supplementary Figure S2). Then, the OCT4-GFP and 
NANOG-GFP BACs were transfected into H7.S6 hESCs 
using either nucleofection or lipofection, with or without 
co-transfected codon optimized PiggyBac transposase ex- 
pression plasmid, mPBase (47). Without co-transfected 
mPBase, both transfection methods produced a similar 
number of resistant colonies (22 and 16 clones for 
OCT4-GFP and 5 and 4 clones for NANOG-GFP per 
10 7 transfected cells; Table 1 and Supplementary Table 
SI). With nucleofection, co-transfection of mPBase did 
not increase the colony number (29 resistant clones for 
OCT4-GFP and 3 clones for NANOG-GFP). However, 
co-lipofection of mPBase produced more colonies for both 
BACs (413 for OCT4-GFP and 14 for NANOG-GFP). 

To check whether the BACs had been integrated by 
transposition, we established a PCR-based strategy for 
colony screening based on primers directed to the 
PiggyBac ITRs (PB5 and PB3) and the ampicillin (Amp) 
gene (Figure 2b). Transposition should integrate the ITRs 
yet exclude the ampicillin gene, whereas random integra- 
tion with or without fragmentation could give any com- 
bination of PCR signals. Most of the colonies established 
by nucleofection, with or without mPBase, contained 
either the whole PiggyBac cassette (PB3+ Amp+ PB5+) 



or none of it (PB3— Amp— PB5— ). The same was 
observed for the lipofection of the BACs alone (Table 1 
and Supplementary Table SI). Notably, only lipofection 
with mPBase resulted in clones that had the signature of 
transposition (PB3+ Amp- PB5+; 18.8% of OCT4-GFP 
and 75% of NANOG-GFP clones). Characterization of 
clones obtained from nucleofection with or without 
mPBase indicated that all contained only limited pieces 
of the BAC and none of them was due to transposition, 
suggesting that nucleofection breaks BACs. 

Given these encouraging results, we generated BAC 
transposon GFP reporters for the lineage-specific genes 
PAX6 and GATA4 and lipofected them into H7.S6 cells 
with or without mPBase. Because these genes are not ex- 
pressed in hESCs, the IRES could not be used to drive 
neomycin resistance so we used the PGK promoter, which 
consequently provided similar efficiencies for both trans- 
fections (Table 1 and Supplementary Table SI). 
Application of the mPBase resulted in 1.7-fold increase 
of the colony number for both GATA4 and PAX6 
BACs, and about one-third of those clones gave the trans- 
positional signature. The use of the hyperactive version of 
PBase, hyPBase (48), further increased the total number of 
colonies and the proportion of transpositional events 
(61.7%). These lipofection results were essentially 
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Table 1. Summary of BAC transgenesis and transpositions. H7.S6 and H9 hES clones were screened for the Piggy Bac transpositional signature 
by PCR after nucleofection or lipofection with or without a PBase expression construct (either mPBase or hyPBase) as indicated 
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All BACs contained a GFP reporter integrated at the initiating ATG codon and the antibiotic resistance gene for selection either under an IRES 
(for OCT4 and NANOG) or expressed from the PGK or UbiC promoter (for GATA4 and PAX6). The BAC sizes are indicated. The 
PiggyBac inverted repeats either flanked the AmpR gene (1 kb apart) or the whole BAC backbone (9.5 kb apart — indicated as TTR backbone'). 
The data in the transpositional events column show the number and percentage of clones positive for the transpositional signature. C N analysed' 
presents the number of clones that were screened for the transpositional signature and 'TV total' presents the yield of clones per 10 7 transfected cells in 
that experiment. 



reproduced using another hESC line, H9 (Table 1 and 
Supplementary Table SI), which is more difficult to trans- 
feet (our unpublished data). The optimum ratio of BAC 
transgene to PBase expression vector was evaluated 
(Supplementary Table S2). In most cases, 10-30 ug BAC 
with 3-10 u-g PBase per 10 cm dish (-3 x 10 6 cells), which 
corresponds to molar ratios of BAC to PBase from 1:9 to 
1:30, gave the most colonies. 

Furthermore, we generated a PAX6-GFP reporter BAC 
that contained PiggyBac ITRs flanking the vector 
backbone (ca. 9.5 kb apart from each other). This version 
also contained the Blasticidin resistance gene driven by the 
ubiquitin C promoter at one end of the genomic sequence 
next to the PiggyBac 3 / -ITR (Supplementary Figure S2). 
Transposition by PBase (mPBase or hyPBase) resulted in 
clones that contained integrated BAC without the vector 
backbone, albeit with apparent lower efficiency. Thus, 
PiggyBac transposition can be used to exclude integration 
of the prokaryotic vector sequences into the genome. 

Copy number of BAC integrations 

To further characterize the BAC integrations, we analysed 
OCT4-GFP, GATA4-GFP and PAX6-GFP hESC clones 
for transgene copy number using qPCR on genomic 
DNA. Primer pairs were selected across the full-length 
BAC at a distance of 30-40 kb from each other (Figures 
2b and 3). The C t -values were normalized to wild-type 
DNA to calculate the number of additional copies 
arising from the integrated transgenes. 

We analysed 17 clones positive for the transpositional 
signature (PB5+ Amp— PB3+; Figure 3). Most clones 



showed single-copy signals for all primer pairs indicative 
of a single transpositional event (12/17; 70.5%). A further 
three clones showed full-length, multi-copy integrations 
(three, two and four copies) suggesting multiple 
transpositional events. The remaining two clones showed 
signs of a partial integration of a second copy in addition 
to a single full-length copy, suggesting a combination of a 
single transpositional event and a partial random event. In 
addition, we characterized 19 clones that did not present 
the transpositional signature (either PB5+ Amp+ PB3+ or 
PB5— Amp— PB3— ; Figure 3b). Most of these clones had 
missing or inconsistent signals for at least one primer pair 
indicating random integration of fragments. 

Analysis of integration sites of the BACs 

The integration sites were examined by splinkerette PCR 
and sequencing (41). We analysed 21 H7.S6 clones 
(including OCT4-GFP, OCT4-mCherry, NANOG-GFP, 
PAX6-GFP, SOX 1 -GFP and GATA4-GFP) and 4 H9 
clones that were positive for the transpositional signature. 
In all cases, the integration locus was continuous on the 5'- 
and 3 / -sides of a duplicated TTAA sequence (Table 2), 
which is characteristic of PiggyBac insertion sites (49). 
As controls we analysed the junctions of PiggyBac ITRs 
in several PB3+ Amp+ PB5+ clones that were established 
by transfection of the BAC without mPBase. As expected, 
the PiggyBac ITRs were followed by the ampicillin gene 
sequence and not genomic DNA, consistent with random 
integration of the BAC (data not shown). 

Previous studies showed that the PiggyBac transposon 
can integrate into any chromosome with a preference for 
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Figure 3. BAC transgene copy number and integrity in H7.S6 hESC clones containing OCT4-, GATA4- and PAX6-GFP reporters. Copy number 
was evaluated by qPCR assays similar to that illustrated in Figure 2b. (a) Results from 17 clones that were positive for the transpositional signature 
(PB5+ Amp— PB3+) are depicted, (b) Results from clones that did not present the transpositional signature, including nine from experiments with 
co-lipofected transposase expression plasmid (+mPBase) and 10 from experiments without co-transfected mPBase (—mPBase). The bars show the 
additional copy number for each primer pair in the clones and the transpositional signature results are shown below. 



AT-rich sequences within 10 kb from transcription units 
(25,29,34). We analysed the sites of integrations of the 
BACs and they were either located in intergenic (n = 14) 
or intronic regions (n = 11) (Table 2) without any obvious 
chromosomal preference or DNA consensus around the 
TTAA site of the integrations (Supplementary Figure S3). 

Functionality of the reporter clones 

The integrity of the reporter BAC transgenes after trans- 
position into hESCs can be functionally evaluated. 
Various clones were examined for the stability and 
pattern of expression before and after differentiation. 
Using single-copy OCT4-GFP clones integrated by 
mPBase (n = 7), we found that all clones uniformly ex- 
pressed GFP (99%) in the undifferentiated state in the 
presence of G418 and notably after >1 month of culture 
without G418 selection pressure (96.13%; Figure 4a). The 
clones were induced to differentiate by bFGF removal 
(50,51) leading to the loss of GFP expression after 8 
days. Hence, the OCT4-GFP BAC transgenes integrated 
by transposition are functional, reliable and not prone to 
position effects. 



To validate lineage reporters for the genes that are not 
expressed in hESCs, we used PAX6-GFP and GATA4- 
GFP BAC transfected clones, created by transposition. 
H7.S6 PAX6-GFP cells were differentiated into neural 
epithelium cells in N2B27 through EBs in the presence 
of TGFP receptor inhibitor (42,43). The attached EBs 
formed neural rosettes, which is the characteristic 
phenotype for neural progenitor cells. 74.81% of the 
cells were GFP positive and co-expressed GFP with en- 
dogenous PAX6 as shown by flow cytometry and 
immunostaining (Figure 4b). 

H7.S6 GATA4-GFP BAC reporter cells were used 
to generate definitive endoderm by treatment with a 
high concentration of activin A (44). Quantitative 
reverse transcription-PCR (qRT-PCR) confirmed 
up-regulation of the endodermal markers GATA4, 
SOX 17 and FOXA2 after 7 days of differentiation 
(Figure 4c). Most (83.67%) cells were GFP positive and 
co-expressed the marker of definitive endoderm CXCR4 
shown by flow cytometry. Interestingly, the dynamic of 
the reporter expression closely reproduced recent observa- 
tions on definitive endoderm differentiation, which 



el 50 Nucleic Acids Research, 2012, Vol. 40, No. 19 Page 8 of 1 3 



HU?H 
< < O H 
OH<U 
<000 
<0<< 
UH<0 

^ ^ < 
<<<< 

HHHH 

tJ) (30 bO bO 
bO &0 too bO 



O <-> H 

< ^ < 

<oo 

U <J O 

< H < 
<!hh 
^ ^ ^ 
^ ^ ^ 
H H H 
H H H 
too bO bO 
too too too 
too too too 



n n < H 

< y h 

^£ o O H 

<hOh 
^ ^ < < 

< < < < 
HHHH 
HHHH 

too bO too too 

too bO too too 

bO bO bO bO 

o3 o3 o3 o3 



S h h 

R<o 

U U o 
U H O 

< O H 
U U < 
^ ^ < 

< < < 
H H H 
H H H 

bO bO bO 
bO bO bO 



H O < 

H U < 
<<< 
OoO 
<! H < 
0<< 
<U < 
U<< 
<<< 
<<< 
H H H 
H H H 

bO bO bO 
bO bO bO 



h8 s 

H°U 
hOu 
H < U 
U < H 

<uo 

HUH 
OCX 

< < < 

< < < 
<<< 

H H H 
H H H 

bO bO bO 
bO bO bO 
bO bO bO 



H H <C 
H^ 
M< 
< O ^ 

oPg 

H H < 

^ ^ ^ 

^ ^ < 

H H H 

H H H 

bO bO bO 
bO bO bO 
bO bO bO 



3^ 

^ <3 
O u 

£ o 

H rh 
Oh 

< o 

Oh 
<3 ^ 
«< ^ 
H H 
H H 

bO bO 
bO bO 



ooooooooooooooooooooooo 
ooooooooooooooooooooooo 



H 
H 

p 

a.s 

'tob £ 



1 bO bO 

o3 o3 

o3 o3 

o3 o3 

1 bO bO 

o3 o3 



bO 

^ bO bO 

5 03 03 

si 5! 03 

bp o3 o3 

^ too too 

o +3 +5 



O O O o 



J 3 ^ ^ 
25 £ ^ ^ 

OH^ 

0<0< 

H U U U 

U <UH 

O < U < 

U U U H 



H < < 
H H H 

< 5 £ 

PI 

^ < < 
O u u 
OhU 



hn too bO bO 

2 03 Gj KJ 

S ^ ct 

o B B B 



H H 
H H 



H < O o 

< ^ ^ < 
<!hOh 



^ M 

cd S ^ 
c« S «J 

f J I 

8 § b 

H £ H 
£ H h 

O U H 

< U H 
U H U 
OHU 
H h H 
u < < 

ouo 

< < H 

< O < 



bO 

bp bo bO 60 
5 o3 03 o3 si 5 



^ oj 

^ o3 

bO 60 bO 

2 ^ ~" 



^ P3 13 

03 oj 5 

bO o3 & 

03 60 



M S? 

o3 ^ 5 

03 ^ ™ 

rrt 03 03 

^ ^ ^ 

- o3 03 



^ ti ^ £ $ -3 ^ 

^ R ^ 11 o o 



q o o H 



^ ^ ^ 

^ ^ ^ 

H H H 

H H H 

U O H 

b H H 

H H H 
O H ^ 

< ^ t3 

H H <3 
O H < 
U H H 



^ ^ ^ 

H ^ 25 

H H H 

U £ H 

0 H o 

< H U 
O H U 
O o U 

y h h 

OHU 
H < U 

o < o 



^ ^ ^ 

H H H 

H H H 

U O H 

H O O 

H H < 

U < < 

U H U 

O < o 

H < O 

0 t 

H H O 



HH 

H O 
U H 
H U 

^ o 

< H 



■ < 



03 



43 



o 



o 
h- 1 



H 



n § bO bOH S bO bO § bO § bO-S § g < 

H .« « « o3 • S wO 

s H 



S bO L_. , 
bO |-j (D 
s-h =3 w -' 

^ o "S 

a 

^ o o . 
cn oo m , — s 

lO CO CO c ■ 
ON (N On (N 

r- oo o\ 

CO O <vf 

OO CO c ■ io 

&n c/3 c/3 &n 

o o o o 
ft ft ft ft 



oo t> o 



ON (N OO 

OO ON OO 

ir> ir> 

io ^ tN 

O oo m 

o ^ 

— — 



O ^-s CN CN 

(N ^ rH IT) 

OO CN ON 

CN oo ^j- 

rj- m m 



cn o 
tj- cn 
in 



vo oo m 

o o m 

o on m 

ON IT) 

Tt O 

— oi m 

OO CN 1—1 
CO 



on cn m 
on 

vo m 
oo 

m cn m 

on cn t-- 

m cn oo 



^ <! 

CN l> 

O 

\o oo 
oo oo 
oo 



O o 

5? a 



o o 
ft ft 



t/3 t/3 C/3 t/3 t/3 



OOt^OOOOOt^OOO 



ft O ft ft ! 



, O ft ft 

' ft ww 



_ GO CZ2 ' 

fto 
^ ft ft^ 



03 



CN CN 

m o 
o m 
o o 

H H 

z z 



r- r- o 

On OO CN 
■ — ' cn ir~) 
— — 



H H H 

z z z 



CN OO 

on oo m 

IT) rH OO NO 

t ■ C ■ OO ' 

o no o 

O o o 

H H H H 

z z z z 



r- oi no 

CO oo 

^- >n h 

NO 

OI 

OOO 

H H H 

z z z 



ON CN 

on 

oo m 

CO CN 
CO CN 
OOO 

H H H 

z z z 



CO t ■ c ■ 

— r- 

t-- ON ON 

NO CN CN 

O co co 
OOO 

H H H 

z z z 



ON NO 
CN O 

o o 
H H 

z z 



< 

< 
o 



CN ^ 

NO (]J 

o ^ 

W NO < 

H 

Z E 



Z M 



u 



c i X! — r i r^i >jo — 



bO g, 

.g 

co g 
ctf O 

ft 2 

g o 



^ON ON 

w ^j-co H C<1 rs H 

rO^-NO«nON^ON^NOrHrHrHfN|rH(Nm^rHfN|ON(N^rH 

HHHHHHHHHHHHHHHHHHHHHHH 



H H 



U 



Ph 
Ph 

O 

NO 

X 

< 

Ph 



Ph 
Ph 

o 

>< 
o 

GO 



Ph 

Ph 

o 

< 
H 

o 



Ph 

Ph 

o 

6 
o 
z 
< 
z 



u 

H 

U 
O 



Ph 

Ph 

o 

H 

U 
O 



•2^ 



H 03 



Page 9 of 13 



Nucleic Acids Research, 2012, Vol. 40, No. 19 el50 



(a) 



H7.S6 hOCT4-GFP BAC 

Undifferentiated 



+ G418 



■G418 



# - Oct4-GFP 


DAPI 




Differentiated 








Oct4-GFP 


DAP I 


overlay 



(b) 



63x 



H7.S6 hPAX6-GFP BAC 





I | H7.S6 wt 
I | H7.S6 Oct4-GFP 



Undifferentiated Differentiated 





(c) 



LU 
Q_ 
i 

01 
O 
X 

o 



hPax6-GFP| r Pax6-TRITC 



H7.S6 hGATA4-GFP BAC 

Undifferentiated 

H7.S6 H7.S6 GATA4-GFP 




63x 



| | H7.S6 wt 

] H7.S6 Pax6-GFP T4 



qPCR 



0,06% 


0,44% 


99,49°/ 


, 0,00% 



0,06% 


0,12% 


99^27 


> 0,00% 



Differentiated 

H7.S6 H7.S6 GATA4-GFP 



80,02% 

■# 


0,58% 


19,34% 


0,06% 



3,86% 


83,67% 

It' 


4,39% 


8,14% 



OCT4 



GATA4 



:1LJm j I I 



H7.S6 wt GATA4-GFP 



H7.S6wt GATA4-GFP 



FOXA2 GFP , 

LU M 



H7.S6 wt GATA4-GFP 



H7.S6 wt GATA4-GFP 



SOX17 GAPDH 

LlJ ill 



H7.S6 wt GATA4-GFP 



H7.S6 wt GATA4-GFP 



GFP 



undifferentiated 
differentiated 



Figure 4. Validation of H7.S6 BAC transposon reporter lines. H7.S6 clones containing verified PiggyBac BAC transpositions for OCT4-GFP, 
PAX6-GFP and GATA4-GFP reporters were analyzed, (a) An H7.S6 OCT4-GFP reporter clone showed homogeneous expression of the transgene 
as evaluated by immunofluorescence (panels at left) and flow cytometry (+G418 FACS panel). GFP expression was without selection pressure 
in undifferentiated cells (— G418 FACS panel). After differentiation for 8 days, GFP expression was uniformly down-regulated (bottom right 
FACS panel), (b) An H7.S6 PAX6-GFP reporter clone did not express any fluorescence before differentiation (FACS panel — undifferentiated). 
After 10 days of differentiation to neural epithelial cells, most cells in rosette-like structures (74.81%; FACS panel — differentiated) expressed GFP, 
which co-localized with endogenous Pax6 by immunofluorescence, (c) An H7.S6 GATA4-GFP reporter clone and the parental H7S6 line were 
differentiated to definitive endoderm cells for 7 days and gene expression was compared with undifferentiated cells for selected genes. CXCR4 
expression was evaluated by FACS (left hand panels). OCT4, GATA4, FOXA2, GFP, SOX17 and GAPDH levels were compared by qRT-PCR 
(right hand panels). 
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showed that GATA4 expression preceeds CXCR4 (52) 
(data not shown). These data support the conclusion 
that the BAC transposon reporters are functional and 
faithfully reproduce endogenous patterns of gene expres- 
sion during differentiation. 

Sleeping Beauty BAC transposition 

To determine whether the ability to transpose a very large 
cargo was a feature of PiggyBac, we undertook BAC 
transposon experiments with SB (Supplementary Figure 
S4). SB ITRs, separated by a spectinomycin resistance 
gene, were inserted into the PAX6-GFP BAC vector and 
co-lipofected with SBlOOxco transposase (a human codon 
optimized version provided by Zsuzsanna Izsvak). The 
combination of BAC and transposase expression 
plasmid resulted in > 20-fold increase in the colony 
number compared with the control co-transfection of 
BAC with catalytic mutant of transposase (D3 construct). 
PCR screening identified the clones with transpositional 
signature (23/32; 72%) and most of them contained a 
single copy of the BAC. We mapped the transgene pos- 
itions in three clones and confirmed a TA integration site 
that is characteristic for SB transposition (Supplementary 
Figure S4). 

DISCUSSION 

All genomes from bacterial to mammalian have been 
bombarded by transposons. Consequently, transposons 
have been widely successful as transgenic vehicles and 
genetic tools in many cell types and organisms (7,8,24). 
However, applications have been almost entirely limited 
to transposons with small cargos due to the facts that the 
majority of natural transposons are <2kb long and larger 
cargos appear to suppress transposition. Indeed, many 
studies have described an inverse length dependence of 
transpositional efficiency in the range of 30-50% per 
each extra kilobase of transposon size (37,53-55). In 
apparent contradiction to this literature, transpositions 
of ~60kb transgenes were reported using 'MycoMar' 
transposon in prokaryotic hosts by our group (38) and 
To 12 in zebrafish and mice by others (39). Furthermore, 
recent model experiments in mESCs reported BAC trans- 
positions up to 100 kb using Piggy Bac (56). The question 
therefore arises: if transposition with very large cargos is 
possible, how can this apparent contradiction be resolved? 
The answer may lie with the suggestion that transposon 
length dependence is determined by the shortest distance 
between the ITRs, regardless of the orientation inside or 
outside the transposon (57). In this case, the transposons 
reported here are efficient because the ITRs are only 1 kb 
apart and the transposase is largely indifferent to the 
100 kb+ size of the cargo (Supplementary Figure S5). 
Consistent with this proposition, we observed similar 
efficiencies of transposition for the BACs that contained 
the same cassette for the selection of resistant clones 
(PGK-neo-pA), regardless of the cargo size difference 
(PAX6, 150kb and GATA4, 196 kb; Figure 2 and Table 
1). Furthermore, experiments using the BACs with 
increased distance between the ITRs (when placed at 



either end of the BAC vector to achieve transgene integra- 
tion without the inclusion of prokaryotic vector se- 
quences) indeed indicate a reduced efficiency (about 2-3- 
fold reduced when the ITRs were spaced about 9.5 kb 
apart; Table 1 and Supplementary Table SI). This explan- 
ation also accounts for the observation that very different 
types of transposases can mediate BAC transposition 
[PiggyBac, SB/MycoMar and Tol2 come from three 
distinct classes (23,25,58,59)]. It also suggests that there 
is no inherent limitation for transposon cargo size, 
which is an unanticipated conclusion. 

The transposition mechanism implies that BAC trans- 
posons must be covalently closed circular molecules when 
transfected. It therefore follows that the quality of the 
BAC DNA preparation will affect the ratio of transposon 
to random integration events. Breaks in the BAC will tend 
to promote random transgenesis whereas covalently 
closed circles will favour transposition. In line with this 
point, we only achieved BAC transposition by lipofection 
and failed when nucleofection was used, apparently 
because nucleofection promotes BAC breakage. Whether 
electroporation also promotes breakage is unclear. We 
note that Li et al. (56) used electroporation to achieve 
BAC transpositions. It is also notable that Li et al. 
achieved transpositions of BACs with the ITRs separated 
by large distances. Because they use more elaborate proto- 
cols, involving transfection of the PiggyBac expression 
plasmid 3 days before electroporation of the BAC, 
followed by positive and negative selection for transpos- 
ition, it is not possible to deduce efficiencies that could be 
compared with those reported here. We show that a much 
simpler protocol based on co-lipofection and positive se- 
lection only is sufficient to achieve satisfactory frequencies 
of BAC transposition when the ITRs are within 1 or a few 
kb of each other. 

BAC transgenesis by transposition brings three major 
advantages over the widely used methods for BAC 
transgenesis by random integration (60,61). For random 
integration, the BAC can be transfected either after 
linearization by restriction digestion or as uncut circles. 
Linearization is useful because it determines how the 
BAC integrates into the genome. In contrast the uncut 
BAC needs to break, which can occur anywhere, before 
integration. However, linearized BACs are more difficult 
to handle because they are prone to shearing. BAC trans- 
position avoids this conundrum because the uncut BAC is 
used and the site of integration on the BAC is determined 
by the positions of the ITRs. Hence, the continuity of the 
integrated BAC can be assured. Furthermore, as a trans- 
poson, the BAC will be integrated as a single copy, which 
ensures physiological expression and avoidance of tandem 
repeat-associated silencing. (Note, as shown here, it is 
possible to obtain cells that have more than one integrated 
transposon; however these will almost always be inde- 
pendent events resulting in single copies integrated at dif- 
ferent genomic sites.) Furthermore, advantages over 
random integration include the fact that transposition in- 
creases the frequency of transgenesis, which is particularly 
important for difficult systems such as hESCs and that the 
genomic integration site can be identified by a standard 
splinkerette assay based on the ITRs. Because the 
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splinkerette assay is laborious, for ease of detection we 
developed a PCR assay for the transpositional signature 
(PB5+ Amp— PB3+), which can be applied for fast 
screening of the clones in a large scale. All clones 
positive for the signature that we also examined by 
splinkerette sequencing were bona fide transpositions. 
Hence, we suggest that BAC transpositions can be 
evaluated using this convenient PCR test with high confi- 
dence. Given these many advantages, and because we 
show that BAC transposition requires no additional trans- 
fection steps when compared with random transgenesis, 
we strongly recommend that all applications of BAC 
transgenesis now use transposition. 

To facilitate this recommendation, we made PiggyBac 
and SB ITR cassettes flanked by homologies to the 
BAC vector in R6K plasmids for ease of recombineering. 
R6K vectors do not replicate in common Escherichia coli 
hosts; hence, the most common source of recombineering 
background is eliminated (62). These standardized ITR 
cassettes are released by restriction digestion with Pad 
and Ascl, which permits asymmetric dephosphorylation 
so that beta recombination can be applied to enhance 
recombineering efficiency (63) (Supplementary Figure 
S2). Whereas this is not necessary for normal applications, 
it is relevant for high-throughput processing in recom- 
bineering pipelines (64-66), which can now be applied to 
existing BAC resources to rapidly convert them into 
transposons. 

We generated a panel of fluorescent BAC reporter 
hESCs, including SOX1 (data not shown), OCT4, 
NANOG, GATA4 and PAX6. In almost all cases, the 
clones showed the correct expression properties quantita- 
tively and stably. For example, all seven OCT4 BAC trans- 
poson hESC clones tested showed stable GFP expression 
in the undifferentiated state after prolonged passaging 
with or without selection pressure. This underlines the re- 
liability of BAC transgenes when compared with, for 
example, the lines generated here using quite large con- 
structs (19-25 kb). Generation of reporter lines for genes 
that are not expressed in hESCs is a particularly difficult 
task due to the silencing of the transgenes and so far this 
has been achieved only in few cases by gene targeting (67- 
71). Herein, we show with PAX6 and GATA4 hESC re- 
porters that BAC transposition is a reliable way to estab- 
lish hESC lines even for genes that are not expressed. To 
our knowledge, this is the first transgenic method that can 
be reliably applied to access the power of hESCs for devel- 
opmental and disease-modelling studies. 

Our work brings a further advance in BAC transgenesis 
in different organisms and cell types, especially those that 
are recalcitrant to genetic modifications such as hESCs. 
This technology includes the standardized insertion 
of transposon ITR cassettes into the BAC backbone 
using recombineering, co-delivery of the BAC with the 
transposase and fast PCR screening for transpositions. 
These steps are straightforward and achieve single-copy, 
full-length, BAC integrations in genomic loci that can be 
readily mapped, all of which we believe sets a new ideal for 
transgenesis. Our work also unravelled a fundamental 
property of transposition regarding the impact of cargo 
size that has been underestimated in the past. 



SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: Sup- 
plementary Tables 1 and 2, Supplementary Figures 1-5, 
Supplementary Methods and Supplementary References 
[72-74]. 
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